Policy Distillation
نویسندگان
چکیده
Policies for complex visual tasks have been successfully learned with deep reinforcement learning, using an approach called deep Q-networks (DQN), but relatively large (task-specific) networks and extensive training are needed to achieve good performance. In this work, we present a novel method called policy distillation that can be used to extract the policy of a reinforcement learning agent and train a new network that performs at the expert level while being dramatically smaller and more efficient. Furthermore, the same method can be used to consolidate multiple task-specific policies into a single policy. We demonstrate these claims using the Atari domain and show that the multi-task distilled agent outperforms the single-task teachers as well as a jointly-trained DQN agent.
منابع مشابه
Knowledge Transfer for Deep Reinforcement Learning with Hierarchical Experience Replay
The process for transferring knowledge of multiple reinforcement learning policies into a single multi-task policy via distillation technique is known as policy distillation. When policy distillation is under a deep reinforcement learning setting, due to the giant parameter size and the huge state space for each task domain, it requires extensive computational efforts to train the multi-task po...
متن کاملMulti-skilled Motion Control
Deep reinforcement learning has demonstrated increasing capabilities for continuous control problems, including agents that can move with skill and agility through their environment. An open problem in this setting is that of developing good strategies for integrating or merging policies for multiple skills, where each individual skill is a specialist in a specific skill and its associated stat...
متن کاملProgressive Reinforcement Learning with Distillation for Multi-Skilled Motion Control
Deep reinforcement learning has demonstrated increasing capabilities for continuous control problems, including agents that can move with skill and agility through their environment. An open problem in this setting is that of developing good strategies for integrating or merging policies for multiple skills, where each individual skill is a specialist in a specific skill and its associated stat...
متن کاملGreener Solvent Selection, Solvent Recycling and Optimal Control for Pharmaceutical and Bio-processing Industries
This paper proposes the simultaneous integration of environmentally benign solvent selection (chemical synthesis), solvent recycling (process synthesis) and optimal control for the separation of azeotropic systems using batch distillation. The previous work performed by Kim et al. (2004) combines the chemical synthesis and process synthesis under uncertainty. For batch distillation, optimal ope...
متن کاملInferential Estimation for a Ternary Batch Distillation
A Kalman filter (KF) estimator has been formulated using a sequence of reduced-order models representing a whole batch behavior for providing the estimates of dynamic composition in a ternary batch distillation process operated in an optimal-reflux policy. A set of full-order models is firstly obtained by linearizing around different pseudo-steady state operating conditions along batch optimal ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1511.06295 شماره
صفحات -
تاریخ انتشار 2015